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Variational analysis of spectral functions simplified 


D. Drusvyatskiy * * C. Kempton ^ 

Abstract. Spectral functions of symmetric matrices - those depending on matri¬ 
ces only through their eigenvalues - appear often in optimization. A cornerstone 
variational analytic tool for studying such functions is a formula relating their subd¬ 
ifferentials to the sub differentials of their diagonal restrictions. This paper presents a 
new, short, and revealing derivation of this result. We then round off the paper with 
an illuminating derivation of the second derivative of C^-smooth spectral functions, 
highlighting the underlying geometry. All of our arguments have direct analogues 
for spectral functions of Hermitian matrices, and for singular value functions of 
rectangular matrices. 
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1 Introduction 

This work revolves around spectral functions. These are functions on the space of 
nxn symmetric matrices S"" that depend on matrices only through their eigenvalues, 
that is, functions that are invariant under the action of the orthogonal group by 
conjugation. Spectral functions can always be written in a composite form / o A, 
where / is a permutation-invariant function on R” and A is a mapping assigning to 
each matrix X the vector of eigenvalues (Ai(A),..., A„(A)) in nonincreasing order. 

A pervasive theme in the study of such functions is that various variational 
properties of the permutation-invariant function / are inherited by the induced 
spectral function /o A; see e.g. [TH^ ITGlfTS] . Take convexity for example. Supposing 
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that / is closed and convex, the main result of shows that the Fenchel conjugate 
of / o A admits the elegant representation 

{foxy = rox. ( 1 . 1 ) 

An immediate conclusion is that / o A agrees with its double conjugate and is 
therefore convex, that is, convexity of / is inherited by the spectral function foX. An 
elegant characterization of the subdifferential d{f o X){X) in terms of df{X{X)) then 
readily follows [71 Theorem 3.1] — an important result for optimization specialists. 

In a follow up paper [S], Lewis showed that even for nonconvex functions /, the 
following exact relationship holds: 

d{f O A)(X) = {U{I)mgv)U^ : n G df{X{X)), U G (1.2) 


where 

:= {U : X = f/(DiagA(X))f/'^}. 

Here, the symbol (9” denotes the group of orthogonal matrices and the symbols 
d{f o A) and df may refer to the Frechet, limiting, or Clarke sub differentials; see 
e.g. [H] for the relevant dehnitions. Thus calculating the sub differential of the 
spectral function / o A on S" reduces to computing the subdifferential of the usually 
much simpler function / on R”. For instance, subdifferential computation of the kWi 
largest eigenvalue function X i-G- Xk{X) amounts to analyzing a piecewise polyhedral 
function, the fc’th order statistic on R"" [SI Section 9]. Moreover, the sub differential 
formula allows one to gauge the underlying geometry of spectral functions, through 
their “active manifolds” [1], for example. 

In striking contrast to the convex case [7], the proof of the general sub differential 
formula fll.2p requires much hner tools, and is less immediate to internalize. This 
paper presents a short, elementary, and revealing derivation of equation fll.2p that 
is no more involved than its convex counterpart. Here’s the basic idea. Consider 
the Moreau envelope 

fa{x) := inf {f{y) + ^\x - y\^}. 

Similar notation will be used for the envelope of /o A. In direct analogy to equation 
dni), we will observe that the Moreau envelope satisfies the equation 

(/ o A)„ = /a o A, 

and derive a convenient formula for the corresponding proximal mapping. The case 
when / is an indicator function was treated in [2], and the argument presented here 
is a straightforward adaptation, depending solely on the Theobald-von Neumann 
inequality [l9l|20]. The key observation now is independent of the eigenvalue set¬ 
ting: membership of a vector v in the proximal or in the Frechet sub differential of 
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any function g aX a point x is completely determined by the local behavior of the 
univariate function a ^ ga{x + av) near the origin. The proof of the subdifferential 
formula fll.2p quickly flows from there. It is interesting to note that the argument 
uses very little information about the properties of the eigenvalue map, with the ex¬ 
ception of the Theobald-von Neumann inequality. Consequently, it applies equally 
well in a more general algebraic setting of certain isometric group actions, encom¬ 
passing also an analogous sub differential formula for functions of singular values 
derived in iminiiig; a discussion can be found in the appendix. A different Lie 
theoretic approach in the convex case appears in [9]. 

We complete the paper by reconsidering the second-order theory of spectral 
functions. In [ini[l6l[I7], the authors derived a formula for the second derivative of 
a C^-smooth spectral function. In its simplest form it reads 

V^F{Diaga)[B] = Diag(VV(«)diag(S)) + ^o S, 


where ^ o i? is the Hadamard product and 

' V/(a)i-V/(a)j 


A - = 


if OjT Qjrj 

ai—aj ^ ' J 

- VV(a)p if 


This identity is quite mysterious, and its derivation is quite opaque geometrically. 
In the current work, we provide a transparent derivation, making clear the role of 
the invariance properties of the gradient graph. To this end, we borrow some ideas 
from im, while giving them a geometric interpretation. 

The outline of the manuscript is as follows. Section [2] records some basic notation 
and an important preliminary result about the Moreau envelope (Lemma 12.Ih . Sec¬ 
tion [3] contains background material on orthogonally invariant functions. Section 0] 
describes the derivation of the subdifferential formula and Section [5] focuses on the 
second-order theory - the main results of the paper. 


2 Notation 

This section briefly records some basic notation, following closely the monograph 
[IT] . The symbol E will always denote an Euclidean space (finite-dimensional real 
inner product space) with inner product (•, •) and induced norm | • |. A closed ball 
of radius e > 0 around a point x will be denoted by Bs{x). The closure and the 
convex hull of a set Q in E will be denoted by cl Q and conv Q, respectively. 

Throughout, we will consider functions / on E taking values in the extended 
real line R ;= RU {icxo}. For such a function / and a point x, with f{x) finite, the 
proximal subdifferential dpf{x) consists of all vectors n G E such that there exists 
constants r > 0 and £ > 0 satisfying 

> f{x) + {v,x — x) — -\x — xp for all x G Bs{x). 
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Whenever / is C'^-smooth near x, the proximal sub differential dpf{x) consists only 
of the gradient Vf{x). A function / is said to be prox-bounded if it majorizes some 
quadratic function. In particular, all lower-bounded functions are prox-bounded. 
For prox-bounded functions, the inequality in the dehnition of the proximal sub¬ 
differential can be taken to hold globally at the cost of increasing r [HJ Propo¬ 
sition 8.46]. The Frechet subdifferential of / at x, denoted df{x), consists of all 
vectors G E satisfying 

f{x) > f{x) + {v,x — x) + o(|x — x\). 

Here, as usual, o(|x — x\) denotes any term satisfying 0. Whenever 

/ is C^-smooth near x, the set df{x) consists only of the gradient V/(x). The 
subdifferentials dpf{x) and df{x) are always convex, while df{x) is also closed. 
The limiting subdifferential of / at x, denoted df{x), consists of all vectors x G E so 
that there exist sequences Xj and Vi G df{xi) with (xj, f{xi), vf) (x, /(x), x). The 
same object arises if the vectors Xj are restricted instead to lie in dpf{xi) for each 
index i; see for example m Corollary 8.47]. The horizon subdifferential, denoted 
d°°f{x), consists of all limits of A^x* for some sequences Xj G df^xf) and Aj > 0 
satisfying x* —?■ x and A* \ 0. This object records horizontal “normals” to the 
epigraph of the function. For example, / is locally Lipschitz continuous around x if 
and only if the set d°°f{x) contains only the zero vector. 

The two key constructions at the heart of the paper are dehned as follows. 
Given a function /: E —)■ R and a parameter a > 0, the Moreau envelope /„ and 
the proximal mapping P^f are dehned by 

Ux) :=M {f{y) + ^\y-x\‘^}, 

Pafix) := argmin {f{y) -F —\y - x^}. 
ygE 4a 

Extending the dehnition slightly, we will set /o(x) := /(x). It is easy to see that / is 
prox-bounded if and only if there exists some point x G E and a real a > 0 satisfying 
fa{x) > -oo. 

The proximal and Frechet subdiherentials are conveniently characterized by a 
diherential property of the function a fa{x av). This observation is recorded 
below. To this end, for any function (p\ [0,cxo) R, the one-sided derivative will 
be denoted by 

«\o a 

Lemma 2.1 (Subdiherential and the Moreau envelope). 

Consider an Isc, prox-bounded function / : E — )■ R, and a point x with /(x) finite. 
Fix a vector x G E and define the function (p\ [0,oo) H by setting (p(a) := 
fa{x + av). Then the following are true. 
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(i) The vector v lies in df{x) if and only if 

\v? 

v>'+(0) = 


( 2 . 1 ) 


(a) The vector v lies in dpf{x) if and only if there exists a > 0 satisfying x G 
Paf{x + av), or equivalently 

I'D P 

if{a) = f{x) + —a. 

In this case, the equation above continues to hold for all a G [0,a;]. 

Proof. Claim (ii) is immediate from definitions; see for example na Proposition 
8.46]. Hence we focus on claim (i). To this end, note first that the inequality 


fa{x + av) - /(x) ^ 


a 


holds for any n G E. 


( 2 . 2 ) 


Consider now a vector v G df{x) and any sequences ck* \ 0 and x* G Phi(x + aiv). 
We may assume Xi ^ x since otherwise there’s nothing to prove. Clearly x* tend to 
X and hence 

/„,(x + aiv) - /(x) = f{xi) - /(x) + ;^|(xi - x) - ttiXp 

2cXi 

/I I\ !■ I IH I I 

> o(|Xi — x|) + -—\Xi — x| + —|n| . 


2a 


Consequently, we obtain the inequality 


faiix + aiv) - /(x) ^ \xi - x\ o{\xi - x|) 


+ 


OLi 


a,; 






OLi 


+ 


\VY 


Taking into account fl2.2p yields the inequality 


0 > 


OLi 


o(|Xj — x|) ^ 1 Xi — X 




OLi 


In particular, we deduce 0, and the equation fl2.1l) follows. 

Conversely suppose that equation fl2.ip holds, and for the sake of contradiction 
that V does not lie in df{x). Then there exists /« > 0 and a sequence yi ^ x 
satisfying 

fiVi) - f{x) - {v,yi - x) < -K\yi - x|. 

Then for any a > 0, observe 

2a 


a 


< - (fiVi) - f{x) + ^livi - x) - avl"^) 
a V 2a / 


< —K 


\yi-x\ , 1 


a 


+ 2 


Vi-x 


a 


+ 


Tr 


Setting ai := £i letting i tend to oo yields a contradiction. 


□ 
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3 Symmetry and orthogonal invariance 

Next we recall a basic correspondence between symmetric functions and spectral 
functions of symmetric matrices. The discussion follows that of [8]. Henceforth R” 
will denote an n-dimensional real Euclidean space with a specihed basis. Hence one 
can associate R"' with a collection of n-tuples (xi,... ,x„), in which case the inner 
product (•, •) is the usual dot product. The hnite group of coordinate permutations 
of R"^ will be denoted by H"'. A function /; R"' —)■ R is symmetric whenever it is 
n”-invariant, meaning 

/(vrx) = /(x) for all x G R” and tt G H". 

It is immediate to verify that if / is symmetric, then so is the Moreau envelope fa 
for any a > 0. This elementary observation will be important later. 

The vector space of real n x n symmetric matrices will be denoted by S” and 
will be endowed with the trace inner product {X, Y) = tr XY, and the induced 
Frobenius norm |X| = VtiX"^. For any x G R”, the symbol Diagx will denote the 
n X n matrix with x on its diagonal and with zeros off the diagonal, while for a 
matrix X G S"", the symbol diag A will denote the n-vector of its diagonal entries. 

The group of real nxn orthogonal matrices will be written as O”. The eigenvalue 
mapping A; S” —?■ R"" assigns to each matrix X in S” the vector of its eigenvalues 
(Ai(A),..., An(A)) in a nonincreasing order. A function F: S” —)■ R is spectral if it 
is (^"-invariant under the conjugation action, meaning 

F{UXU^) = F{X) for all A G S” and G C>”. 

In other words, spectral functions are those that depend on matrices only through 
their eigenvalues. A basic fact is that any spectral function F on S” can be written 
as a composition of F = / o A for some symmetric function / on R"^. Indeed, / can 
be realized as the restriction of F to diagonal matrices /(x) = F(Diagx). 

Two matrices A and Y in S"' are said to admit a simultaneous spectral decom¬ 
position if there exists an orthogonal matrix U G such that UXU"’" and UYU"’" 
are both diagonal matrices. It is well-known that this condition holds if and only 
if A and Y commute. The matrices A and Y are said to admit a simultaneous or¬ 
dered spectral decomposition if there exists an orthogonal matrix U G O” satisfying 
UXU^ = DiagA(A) and UYU"’" = DiagA(y). The following result characterizing 
this property, essentially due to Theobald [19] and von Neumann [20], plays a central 
role in spectral variation analysis. 

Theorem 3.1 (Von Neumann-Theobald). Any two matrices X and Y in satisfy 
the inequality 

|A(A)-A(V)|<|A-V|. 

Equality holds if and only if X and Y admit a simultaneous ordered spectral decom¬ 
position. 
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This result is often called a trace inequality, since the eigenvalue mapping being 
1-Lipschitz (as in the statement above) is equivalent to the inequality 

(A(X), A(y)) > (X, Y) for all X, T G S". 

4 Derivation of the subdifferential formula 

In this section, we derive the subdifferential formula for spectral functions. In what 
follows, for any matrix X G S” dehne the diagonalizing matrix set 

Ox := {U eO^ ■. U{Y)mg\{X))U^ = X}. 

The spectral sub differential formula readily follows from Lemma l2T] and the follow¬ 
ing intuitive proposition, a proof of which can essentially be seen in [21 Proposition 
8 ], 

Theorem 4.1 (Proximal analysis of spectral functions). 

Consider a symmetric function f: R” —)■ R. Then the equation 

if o A)a = /« o A holds. (4.1) 

In addition, the proximal mapping admits the representation: 

PM o A)(X) = {U{Dtag y)M : y e P„/(A(X)), UeOx}- (4.2) 

Moreover, for any Y E Pa{f o \){X) the matrices X and Y admit a simultaneous 
ordered spectral decomposition. 

Proof. For any X and Y, applying the trace inequality fTheorem 13.ip . we deduce 

/(A(r)) + > /(A(y)) + ^|A(y) - a(x)|^ > u[\(x)). (4.3) 

Taking the inhmum over Y, we deduce (/o A)q(X) > /„(A(X)). On the other hand, 
for any U G Ox, the inequalities hold: 

(/ o \UX) = inf {/(A(y)) + T|y - xiq 

= inf {/(A(y)) + i|c/Wt;-DiagA(x)iq < U(\{X)). 

This establishes (14.Ih . 

To establish equation (14.2p . consider first a matrix U G Ox and a vector y G 
Paf{X{X)), and define Y := UfDiagy)^. Then we have 

(/ ° A)(y)+T|y - xp = f(y) + T|j, - A(x)p = UMxyj = (/ o a)„(x). 
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Hence the inclusion Y G Pa{f ° •^)(-^) is valid, as claimed. Conversely, fix any 
matrix Y G Fa(/ o A)(X). Then plugging in Y into 04.31) . the left-hand-side equals 
(/ ° and hence the two inequalities in 04.31) hold as equalities. The second 

equality immediately yields the inclusion X{Y) G Paf{X{X)), while the first along 
with Theorem 13.11 implies that X and Y admit a simultaneous ordered spectral 
decomposition, as claimed. □ 

Combining Lemma 12.11 and Theorem 14.11 the main result of the paper readily 
follows. 

Theorem 4.2 (Subdifferentials of spectral functions). Consider an Isc symmetric 
function f: R"' —)■ R. Then the following equation holds: 

d{f O A)(X) = {U{Dtagv)U^ : v G df{X{X)), UeOx}. (4.4) 

Analogous formulas hold for the proximal, Frechet, and horizon subdifferentials. 

Proof. Fix a matrix X in the domain of / o A and define x := A(X). Without loss 
of generality, suppose that / is lower-bounded. Indeed if this were not the case, 
then since / is Isc there exists e > 0 so that / is lower-bounded on the ball Bfx). 
Consequently adding to / the indicator function of the symmetric set U^en'Be(7rx) 
assures that the function is lower-bounded. 

We first dispense with the easy inclusion C for all the sub differentials. To this 
end, recall that if fo is a proximal subgradient of / o A at X, then there exists 
a > 0 satisfying X G Pa{f ° '^)(-^ + o:V). Theorem 14.11 then implies that X 
and V commute. Taking limits, we deduce that all Frechet, limiting, and horizon 
subgradients of /o A at X also commute with X. Recalling that commuting matrices 
admit simultaneous spectral decomposition, basic definitions immediately yield the 
inclusion C in equation fl4.4p for the proximal and for the Frechet subdifferentials. 
Taking limits, we deduce the inclusion C in fl4.4p for the limiting and for the horizon 
subdifferentials, as well. 

Next, we argue the reverse inclusion. To this end, define V := U(Dia.gv)U"’" for 
an arbitrary matrix U G Ox and any vector v G R"'. Then Theorem 14.11 along with 
the symmetry of the envelope fa, yields the equation 

(/ ° A)a(W -F aV) - f{X{X)) ^ fa{x + av) - f{x) 
a OL 

Consequently if v lies in dpf{x), then Lemma [2.11 shows that for some a > 0 the 
right-hand-side equals ■^, or equivalently Lemma [2T] then yields the inclusion 

V G dp{f o A)(X). Similarly if v lies in df{x), then the same argument but with a 
tending to 0 shows that V lies in d{f oX){X). Thus the inclusion D in equation fl4.4l) 
holds for the proximal and for the Frechet subdifferentials. Taking limits, the same 
inclusion holds for the limiting and for the horizon subdifferentials. This completes 
the proof. □ 


















Remark 4.3. It easily follows from Theorem 14.21 that the inclusion 'D holds for the 
Clarke subdifferential. The reverse inclusion, however, requires a separate argument 
given in P Sections 7-8]. 

In conclusion, we should mention that all the arguments in the section apply 
equally well for Hermitian matrices (with the standard Hermitian trace product), 
with the orthogonal matrices replaced by unitary matrices. Entirely analogous ar¬ 
guments also apply for functions of singular values of rectangular matrices (real or 
complex). For more details, see the appendix in the arXiv version of the paper. 


5 Hessians of (7^-smooth spectral functions 


In this section, we revisit the second-order theory of spectral functions. To this 
end, fix for the entire section an Isc symmetric function /: R” R and dehne the 
spectral function F := / o A on S"". It is well known that / is C^-smooth around a 
matrix X if and only if F is C^-smooth around A(X); see [T0lfT6l - [l8] . Moreover, a 
formula for the Hessian of F is available: for matrices A = Diag(a) and R G S” we 
have 

X^F{A)[B] = Diag(VV(a)diag(R)) +AoB, 
where M o R is the Hadamard product and 


A - = 


' V/(a)i-V/(^ 


if Oj 7^ aj 
if flj = Qj 


The assumption that H is a diagonal matrix is made without loss of generality, 
as will be apparent shortly. In this section, we provide a transparent geometric 
derivation of the Hessian formula by considering invariance properties of gphVR. 
Some of our arguments give a geometric interpretation of the techniques in [1^ . 
Remark 5.1 (Hessian and the gradient graph). Throughout the section we will appeal 
to the following basic property of the Hessian. For any C^-smooth function g on an 
Euclidean space, the vector 2: := V‘^g{a)[b] is the unique vector satisfying {z, —h) G 

-^gphVg(®) ^5'(®))- 

Consider now the action of the orthogonal group on S” by conjugation namely 
U.X = UXU"^. Recall that F is invariant under this action, meaning F{U.X) = 
F{X) for all orthogonal matrices U. This action naturally extends to the product 
space S” X S” by setting U.{X, Y) = {U.X, U.Y). As we have seen, the graph gph VR 
is then invariant with respect to this action; 


R.gph VR = gph VR for all U eO^. 


One immediate observation is that N^py,sjf{U.X,U.Y) = f/.Agph vf(^, H). Conse¬ 
quently we deduce 

(Z, -R) G iVgph vf(^, Y) ^ {U.Z, -U.B) G iVgph vF(f/.^, U.Y) 
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The formula 

y^F{X)[B] = U'^ .W‘^F{U.X)[U.B] (5.1) 

now follows directly from Remark 15.1[ whenever F is C^-smooth around X. As a 
result, when speaking about the operator V^F(X), we may assume without loss of 
generality that X and XF{X) are both diagonal matrices. 

Next we briefly recall a few rudimentary properties of the conjugation action; see 
for example [HI Sections 4, 8, 9]. We say that a. n x n matrix W is skew-symmetric 
if W'^ = —W. Then it is well-known that O” is a smooth manifold and the tangent 
space to 0” at the identity matrix consists of skew-symmetric matrices: 

Ton(/) = {W e : W is skew-symmetric}. 

The commutator of two matrices A,Be denoted by [A, B] is the matrix 

[A, R] := AB — BA. An easy computation shows that the commutator of a skew- 
symmetric matrix with a symmetric matrix is itself symmetric. Moreover, the iden¬ 
tity 

(A, [W,Z]) = {[X,W],Z) 

holds for any matrices X, Z E and skew-symmetric W. For any matrix A G S”, 
the orbit of A, denoted by O^.A is the set 

O^.A = {U.A : U e O^}. 

Similarly, the orbit of a pair (A, R) G S"' x S" is the set 

0^.{A, B) = {{U.A, U.B) : U G C>"}. 

An standard computatioij}] now shows that orbits are smooth manifolds with tangent 
spaces 


To^.a{A) = {[VF, A] : W is skew-symmetric}, 

To^.(a,b){A, B) = {([IF, A], [IF, R]) : W is skew-symmetric}. 

Now supposing that F is twice differentiable at a matrix A G the graph 

gph VR certainly contains the orbit 0^.(A, VF(A)). In particular, this implies that 
the tangent space to gph VR at (A, VR(A)) contains the tangent space to the orbit: 

{([VF, A], [IF, VR(A)]) : IF skew-symmetric}. 

Thus for any R G S"', the tuple (V^R(A)[R], —R) is orthogonal to the tuple 
([VF, A], [IF, VR(A)]) for any skew-symmetric matrix IF. We record this elemen¬ 
tary observation in the following lemma. This also appears as m Lemma 3.2]. 

^Compute the differential of the mapping O" B U t-A U.A 
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Lemma 5.2 (Orthogonality to orbits). Suppose F is C^-smooth around A G S"". 
Then for any skew-symmetric matrix W and any i? G S"’, we have 

(V=F(4)|B]. [W, A]) = (S. [W, VF(^)]). 

Proof. This is immediate from the preceding discussion. □ 

Next recall that the stabilizer of a matrix A G S” is the set: 


Stab(A) = {U eO^ : U.A = A}. 

Similarly we may dehne the set Stab(A,B). 

Lemma 5.3 (Tangent space to the stabilizer). For any matrices A, B E S^, the 
tangent spaces to Stab{A) and to Stab{A, B) at the identity matrix are the sets 

{W G : W skew-symmetnc, [W,A] = 0}, 

{W G : W skew-symmetnc, [Vh, A] = [W,B]=0}, 

respectively. 

(Proof sketch). Dehne the orbit map 9^"^^: —)■ O^.A by setting 9^^\U) := U.A. A 

quick computation shows that 9^^'> is equivariant with respect to left-multiplication 
action of on itself and the conjugation action of (T" on O^.A. Hence the equiv¬ 
ariant rank theorem ( [HI Theorem 7.25]) implies that 9^^'^ has constant rank. In 
fact, since 9^^'^ is surjective, it is a submersion. It follows that the stabilizer 

Stab(A) = (0("^))-i(A) 

is a smooth manifold with tangent space at the identity equal to the kernel of the 
differential d= [W, A]. The expression for the tangent space to Stab(A) 
immediately follows. The analogous expression for Stab(A, B) follows along similar 
lines. □ 

With this, we are able to state and prove the main theorem. 

Theorem 5.4 (Hessian of C^-smooth spectral functions). Consider a symmetric 
function f: R" —)■ R and the spectral function F = f o X. Suppose that F is C^- 
smooth around a matrix A := Diag{a) and for any matrix matrix B E define 
Z := V^F(A)[i?]. Then eguality 

diag{Z) = f{a)[diag{B)], 

holds, while for indices i ^ j, we have 




B, 




V/(a)i-V/(a)j 


5p(VV(a)u-VV(a)*,) 


if Oii ^ ttj 

if • 
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Proof. First observe that clearly / must be smooth at a. Now, since A is diagonal, 
so is the gradient VF{A). So without loss of generality, we can assume VF^A) = 
Diag(V/(a)). 

Observe now that {Z,—B) is orthogonal to the tangent space of gphVF at 
(A, VF{A)). On the other hand, for any vector a' G R”, we have equality 

/ ( Z \ ( Diag(a') - Diag(a) ^ \ / diag(Z) \ / a' - a 

\ \-B) ’ ^Diag(V/(a')) - Diag(V/(a)); / \ ^-diag(R) J ’ ^V/(a') - V/(a) 

It follows immediately that the tuple (diag(Z), —diag(i?)) is orthogonal to the tan¬ 
gent space of gphV/ at (a, V/(a)). Hence we deduce the equality diag(Z) = 
V^/(a)[diag(R)] as claimed. 

Next fix indices i and j with ai 7^ a^-, and define the skew-symmetric matrix 
pp(ij) g.gT _ where denotes the fc’th standard basis vector. Applying 
Lemma [5.21 with the skew-symmetric matrix W = ^ we obtain 



-2Z« = (z. -4]) = - B ], VF(A)) 

'Vf(a), - V/(o), 


- = -2B, 


tti — Oj 


The claimed formula Zij = Bij j follows. 

Finally, £x indices i and j, with a* = aj. Observe now the inclusion 

Stab(A) c Stab(VF(A)). 

Indeed for any matrix U G Stab(A), we have 

VF{A) = VF{UAU^) = UVF{A)U^. 


This in particular immediately implies that the tangent space rgphVF(A, VF(A)) is 
invariant under the action of Stab(A), that is 


f/.rgphVF(A VF(A)) = Tgphvi^(A, VF(A)) 

for any U G Stab(A). Hence their entire orbit Stab(A).(X, Y) of any tangent vector 
{X,Y) G Tgph vf(A, VF(A)) is contained in the tangent space Fgph vf(A, VF(A)). 
We conclude that the tangent space to such an orbit Stab(A).(A, H) at {X,Y) is 
contained in Fgph vf(^, VF(A)) as well. 

Dehne now the matrices Fj := Diag(ej) and Z := Diag(V^/(a)[ej]). Because F 
is C^-smooth, clearly the inclusion {Ft, Z) G FgphVF(A, VF(A) holds. The above 
argument, along with Lemma [5.31 immediately implies the inclusion 

{{[W, Fi], [W, Z]) : W skew-symmetric, [W, A] = 0} C TgphVF(A, VF(A)) 
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and in particular, {\W^ i?j], [W, Z]) is orthogonal to (Z, —B) for any skew-symmetric 
W satisfying [Vh, A] = 0. To finish the proof, simply set W = hhh.i). Then since 
Qi = ttj, we have [W, A] = 0 and therefore 

-2% = (Z, = {B, [W^^’^\Z]) = - 

= -2i?,,(VV(a)n-VV(«)p), 

as claimed. This completes the proof. □ 

Remark 5.5. The appealing geometric techniques presented in this section seem 
promising for obtaining at least necessary conditions for the generalized Hessian, in 
the sense of [13], of spectral functions that are not necessarily C^-smooth. Indeed 
the arguments presented deal entirely with the graph gphV/, a setting perfectly 
adapted to generalized Hessian computations. There are difficulties, however. To 
illustrate, consider a matrix Z G d’^F{A\V). Then one can easily establish properties 
of Diag Z analogous to those presented in Theorem 15.41 as well as properties of Zij 
for indices i and j satisfying a* 7 ^ aj. The difficulty occurs for indices i and j with 
a* = aj. In this case, our argument used explicitly the fact that tangent cones to 
gph df are linear subspaces, a property that is decisively false in the general setting. 

A Comments on isometric group actions 

It is clear from the Section 1-4, that there is a richer underlying structure governing 
the results of Theorems I4.1l and l4.21 with the trace inequality (Theorem 13.Ih playing 
an essential role. This appendix outlines a rudimentary algebraic framework in 
which the previous arguments can be understood, unifying the eigenvalue and the 
singular value pictures m, while leaving room for new settings to be explored. 

Fix a metric space V and a group Q acting on V by isometries. Let RL be another 
metric space injecting isometrically by a mapping i\ RL into V. Intuitively, RL 
is a subset of V with i the canonical injection. Notationally, however, it is cleaner to 
consider RL as a. separate entity. Without loss of generality, we will use the symbol 
(i(-, •) to denote the metric both in V and in RL. Fix also a distinguished ^-invariant 
mapping p: V ^ RL. The diagram summarizes the notation: 

n^v 

p 

It is instructive to keep in mind the following motivating examples: 

Diag Diag Diag Diag 

R n —> on Tyn —y Tjn Tym —> -omxn Tym —> /^mxn 

< - O 5 XV < - XX , XV i - XV 5 XV i — 

A A cr cr 

In the hrst example (the focus of the previous sections), the group Q = O"' acts 
by conjugation U.X = UXU^. In the second example, is the space of n x 
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n Hermitian matrices (with the standard Hermitian inner production (X, Y) = 
re X*Y), and Q is the unitary group acting by U.X = UXU*. In the third example 
is the space of real mxn matrices (with the trace product {X, Y) = tr X'^Y), 
the group Q = x acts by {U, V).X = UXV'^, and a is the mapping assigning 
to each mxn matrix its vector of singular values in a nonincreasing order. The 
fourth example is analogous. The goal of this section is to isolate the shared features 
of the four examples above that make a subdifferential formula along the lines of 
fl4.4p possible. That is, we aim to investigate conditions on p under which one 
can effectively treat ^-invariant functions T; V —?■ R by instead considering their 
restrictions F o z: —t- R. Some notational abstraction will greatly help simplify 

the ensuing formulas. To this end, following standard terminology, the pullback of 
any mapping F on V is the mapping F* := F o i defined now on T-i. Similarly the 
pullback of a mapping / on "H is the mapping /* := fop on V. The pushforward of p 
is the mapping p^ = iop: V —)■ V. For instance in the first example, for any function 
F: S"' —)■ R, the pullback F*{x) is the diagonal restriction x i—)■ F(Diag(x)); the 
pullback of a function / on R"' is the spectral mapping /* = / o A; and the pullback 
of A is the reordering mapping f: R"' —)■ R"^, meaning x^ is obtained by permuting 
coordinates of x to be nonincreasing. The following definition identifies the salient 
properties needed, in light of the current paper, for effective treatment of ^-invariant 
functions F: V —?■ R by means of their restrictions F*; "H —)■ R. For clarity, elements 
of Ff will be denoted with lower-case letters, while elements of V will be denoted 
with upper-case letters. 

Definition A.l (Metric reduction). The space V metrically reduces to "H if the 
following compatibility conditions hold: 

1. (Idempotence) p* op* = p*] 

2. (Orbit preservation) p*{X) lies in the ^-orbit of X, for all X G V; 

3. (Non-expansiveness) d[p{X),p(Y)) < d[X,Y) for all X,Y G V; 

The reduction is faithful if in addition the following is true for all X, F G V: 

d{p{X),p(Y)) = d{X,Y) 3g e Q with gX = p^{X) and gY = p^{Y). 

An appropriate notion of symmetry on H that is compatible with ^-invariance 
on V is as follows. A function /: "H —R is p-symmetric whenever 

f{p*{x)) = f{x) for any x G "H. 

In the spectral example, S” faithfully reduces to R"^ as a consequence of Theorem 13.1( 
^-invariant functions are what we called spectral, while A-symmetric functions are 
what we called symmetric. The other three running examples are analogous. 
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Lemma A. 2 (Invariance and symmetry). The following two properties of a function 
F:V^R are equivalent. 

(i) F is Q-invariant. 

(a) F = f o p for some p-symmetric function f on TL. 

(Hi) F = (F*)* 

Proof. Suppose that (i) holds and dehne / := F*. Then observe f*{X) = Fop^i^X). 
By the orbit preservation property, there exists some g E Q satisfying gx = p*{X) 
and hence f*{X) = F{X) for all A G V. Hence implication (Hi) holds. Suppose 
now that (Hi) holds, meaning F{X) = F*{p{X)) for all A G V. Hence, in particular 
F*iy) = F{t{y)) = {F*y{t{y)) = F\p o t{y)) = F*{p*{y)) for all yen. By 
dehnition then F* is p-symmetric and {ii) follows. The hnal implication [ii) [i) 
is trivial since p is ^-invariant. □ 

For notational convenience, henceforth, for any point yen the corresponding 
capital letter Y will stand for i{y). Observe that the Moreau envelopes and proximal 
mappings of functions on V and on n have obvious meanings. A proof nearly 
identical to that of Theorem 14.11 shows that if V metrically reduces to "H, then for 
any Isc p-symmetric function /; "H —> R the commutatively relation holds: 

(Da = (D*- 

Assuming in addition that the reduction is faithful, the equation holds: 

F„r(A) = [g-^Y : y G Paf{p{X)), g G Qx], 


where 

{geG-.pyx)=gX]. 

Moreover, for any Z G Paf*{X) there exists g e G satisfying p^Z) = gZ and 
p*(A) = gX. Suppose moreover that V and n are Euclidean spaces with i a linear 
mapping, and that ^ is a compact subgroup of linear isometries. Then a proof 
identical to that of Theorem 14.21 shows that the following formula holds: 

dnX) = {g-^V : G a/(p(A)), g G Gx}, 

The four running examples of the section £t nicely into this framework. 
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